Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 200000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 29.0 MiB |
| Average record size in memory | 152.0 B |
Variable types
| Numeric | 10 |
|---|---|
| Categorical | 9 |
price_vnd is highly skewed (γ1 = 235.7020306) | Skewed |
sim_number has unique values | Unique |
same_of_a_kind has 182027 (91.0%) zeros | Zeros |
straight has 189488 (94.7%) zeros | Zeros |
same_of_a_kind_middle has 170845 (85.4%) zeros | Zeros |
straight_middle has 188280 (94.1%) zeros | Zeros |
taxi has 174881 (87.4%) zeros | Zeros |
reserve has 197897 (98.9%) zeros | Zeros |
reserve_middle has 191713 (95.9%) zeros | Zeros |
last_number has 11894 (5.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-10 21:38:32.599290 |
|---|---|
| Analysis finished | 2022-12-10 21:39:06.449285 |
| Duration | 33.85 seconds |
| Software version | pandas-profiling v2.12.0 |
| Download configuration | config.yaml |
| Distinct | 939 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13950269.82 |
| Minimum | 99000 |
|---|---|
| Maximum | 1.68 × 1011 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 99000 |
|---|---|
| 5-th percentile | 450000 |
| Q1 | 500000 |
| median | 1000000 |
| Q3 | 5000000 |
| 95-th percentile | 15000000 |
| Maximum | 1.68 × 1011 |
| Range | 1.67999901 × 1011 |
| Interquartile range (IQR) | 4500000 |
Descriptive statistics
| Standard deviation | 591299676.8 |
|---|---|
| Coefficient of variation (CV) | 42.38625379 |
| Kurtosis | 63791.3299 |
| Mean | 13950269.82 |
| Median Absolute Deviation (MAD) | 550000 |
| Skewness | 235.7020306 |
| Sum | 2.790053963 × 1012 |
| Variance | 3.496353078 × 1017 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 450000 | 45119 | |
| 1000000 | 38472 | |
| 500000 | 30619 | |
| 3000000 | 29199 | |
| 5000000 | 21383 | |
| 10000000 | 6139 | 3.1% |
| 12000000 | 5124 | 2.6% |
| 11325000 | 2682 | 1.3% |
| 11000000 | 1760 | 0.9% |
| 13000000 | 1720 | 0.9% |
| Other values (929) | 17783 | 8.9% |
| Value | Count | Frequency (%) |
| 99000 | 58 | |
| 119000 | 32 | |
| 199000 | 18 | < 0.1% |
| 220000 | 1 | < 0.1% |
| 250000 | 47 |
| Value | Count | Frequency (%) |
| 1.68 × 1011 | 1 | |
| 1.65 × 1011 | 1 | |
| 5.5 × 1010 | 1 | |
| 4.9 × 1010 | 1 | |
| 4.5 × 1010 | 1 |
| Distinct | 200000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 730479740.9 |
| Minimum | 325009779 |
|---|---|
| Maximum | 997979696 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 325009779 |
|---|---|
| 5-th percentile | 329600084.4 |
| Q1 | 392260526 |
| median | 833428672.5 |
| Q3 | 918087387.5 |
| 95-th percentile | 977250001 |
| Maximum | 997979696 |
| Range | 672969917 |
| Interquartile range (IQR) | 525826861.5 |
Descriptive statistics
| Standard deviation | 240647935.3 |
|---|---|
| Coefficient of variation (CV) | 0.3294382059 |
| Kurtosis | -1.070182089 |
| Mean | 730479740.9 |
| Median Absolute Deviation (MAD) | 103409268.5 |
| Skewness | -0.7831819706 |
| Sum | 1.460959482 × 1014 |
| Variance | 5.791142876 × 1016 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 343189288 | 1 | < 0.1% |
| 382056969 | 1 | < 0.1% |
| 817212227 | 1 | < 0.1% |
| 877150979 | 1 | < 0.1% |
| 775775531 | 1 | < 0.1% |
| 819857999 | 1 | < 0.1% |
| 989283414 | 1 | < 0.1% |
| 978111043 | 1 | < 0.1% |
| 967937992 | 1 | < 0.1% |
| 946237833 | 1 | < 0.1% |
| Other values (199990) | 199990 |
| Value | Count | Frequency (%) |
| 325009779 | 1 | |
| 325011980 | 1 | |
| 325011981 | 1 | |
| 325012006 | 1 | |
| 325012010 | 1 |
| Value | Count | Frequency (%) |
| 997979696 | 1 | |
| 997979191 | 1 | |
| 997968999 | 1 | |
| 997968968 | 1 | |
| 997950999 | 1 |
provider
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| viettel | |
|---|---|
| mobifone | |
| vinaphone | |
| vietnamobile | |
| itelecom | 6899 |
Length
| Max length | 12 |
|---|---|
| Median length | 8 |
| Mean length | 7.99063 |
| Min length | 7 |
Characters and Unicode
| Total characters | 1598126 |
|---|---|
| Distinct characters | 15 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | viettel |
|---|---|
| 2nd row | vinaphone |
| 3rd row | vietnamobile |
| 4th row | mobifone |
| 5th row | viettel |
| Value | Count | Frequency (%) |
| viettel | 88584 | |
| mobifone | 51854 | |
| vinaphone | 40694 | |
| vietnamobile | 11597 | 5.8% |
| itelecom | 6899 | 3.4% |
| gmobile | 372 | 0.2% |
| Value | Count | Frequency (%) |
| viettel | 88584 | |
| mobifone | 51854 | |
| vinaphone | 40694 | |
| vietnamobile | 11597 | 5.8% |
| itelecom | 6899 | 3.4% |
| gmobile | 372 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 307080 | |
| i | 211597 | |
| t | 195664 | |
| o | 163270 | |
| n | 144839 | |
| v | 140875 | |
| l | 107452 | 6.7% |
| m | 70722 | 4.4% |
| b | 63823 | 4.0% |
| a | 52291 | 3.3% |
| Other values (5) | 140513 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1598126 |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 307080 | |
| i | 211597 | |
| t | 195664 | |
| o | 163270 | |
| n | 144839 | |
| v | 140875 | |
| l | 107452 | 6.7% |
| m | 70722 | 4.4% |
| b | 63823 | 4.0% |
| a | 52291 | 3.3% |
| Other values (5) | 140513 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1598126 |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 307080 | |
| i | 211597 | |
| t | 195664 | |
| o | 163270 | |
| n | 144839 | |
| v | 140875 | |
| l | 107452 | 6.7% |
| m | 70722 | 4.4% |
| b | 63823 | 4.0% |
| a | 52291 | 3.3% |
| Other values (5) | 140513 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1598126 |
Most frequent character per block
| Value | Count | Frequency (%) |
| e | 307080 | |
| i | 211597 | |
| t | 195664 | |
| o | 163270 | |
| n | 144839 | |
| v | 140875 | |
| l | 107452 | 6.7% |
| m | 70722 | 4.4% |
| b | 63823 | 4.0% |
| a | 52291 | 3.3% |
| Other values (5) | 140513 |
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.29203 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 182027 |
| Zeros (%) | 91.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.9424504869 |
|---|---|
| Coefficient of variation (CV) | 3.227238595 |
| Kurtosis | 7.923821423 |
| Mean | 0.29203 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.047644246 |
| Sum | 58406 |
| Variance | 0.8882129202 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 182027 | |
| 3 | 14103 | 7.1% |
| 4 | 3366 | 1.7% |
| 5 | 419 | 0.2% |
| 6 | 64 | < 0.1% |
| 7 | 15 | < 0.1% |
| 8 | 5 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 182027 | |
| 3 | 14103 | 7.1% |
| 4 | 3366 | 1.7% |
| 5 | 419 | 0.2% |
| 6 | 64 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 1 | < 0.1% |
| 8 | 5 | < 0.1% |
| 7 | 15 | < 0.1% |
| 6 | 64 | < 0.1% |
| 5 | 419 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.16532 |
| Minimum | 0 |
|---|---|
| Maximum | 7 |
| Zeros | 189488 |
| Zeros (%) | 94.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7085914254 |
|---|---|
| Coefficient of variation (CV) | 4.286180894 |
| Kurtosis | 16.21791883 |
| Mean | 0.16532 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.173876487 |
| Sum | 33064 |
| Variance | 0.5021018081 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 189488 | |
| 3 | 9222 | 4.6% |
| 4 | 1098 | 0.5% |
| 5 | 153 | 0.1% |
| 6 | 32 | < 0.1% |
| 7 | 7 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 189488 | |
| 3 | 9222 | 4.6% |
| 4 | 1098 | 0.5% |
| 5 | 153 | 0.1% |
| 6 | 32 | < 0.1% |
| Value | Count | Frequency (%) |
| 7 | 7 | < 0.1% |
| 6 | 32 | < 0.1% |
| 5 | 153 | 0.1% |
| 4 | 1098 | 0.5% |
| 3 | 9222 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.48057 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 170845 |
| Zeros (%) | 85.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.186439848 |
|---|---|
| Coefficient of variation (CV) | 2.468817962 |
| Kurtosis | 3.453718664 |
| Mean | 0.48057 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.211302446 |
| Sum | 96114 |
| Variance | 1.407639513 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 170845 | |
| 3 | 22574 | 11.3% |
| 4 | 4799 | 2.4% |
| 5 | 1530 | 0.8% |
| 6 | 219 | 0.1% |
| 7 | 32 | < 0.1% |
| 8 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 170845 | |
| 3 | 22574 | 11.3% |
| 4 | 4799 | 2.4% |
| 5 | 1530 | 0.8% |
| 6 | 219 | 0.1% |
| Value | Count | Frequency (%) |
| 8 | 1 | < 0.1% |
| 7 | 32 | < 0.1% |
| 6 | 219 | 0.1% |
| 5 | 1530 | 0.8% |
| 4 | 4799 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.19016 |
| Minimum | 0 |
|---|---|
| Maximum | 7 |
| Zeros | 188280 |
| Zeros (%) | 94.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7750239852 |
|---|---|
| Coefficient of variation (CV) | 4.075641487 |
| Kurtosis | 15.37875478 |
| Mean | 0.19016 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.026287051 |
| Sum | 38032 |
| Variance | 0.6006621777 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 188280 | |
| 3 | 9632 | 4.8% |
| 4 | 1398 | 0.7% |
| 5 | 608 | 0.3% |
| 6 | 70 | < 0.1% |
| 7 | 12 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 188280 | |
| 3 | 9632 | 4.8% |
| 4 | 1398 | 0.7% |
| 5 | 608 | 0.3% |
| 6 | 70 | < 0.1% |
| Value | Count | Frequency (%) |
| 7 | 12 | < 0.1% |
| 6 | 70 | < 0.1% |
| 5 | 608 | 0.3% |
| 4 | 1398 | 0.7% |
| 3 | 9632 |
fortune
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 1 | 17913 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 182087 | |
| 1 | 17913 | 9.0% |
| Value | Count | Frequency (%) |
| 0 | 182087 | |
| 1 | 17913 | 9.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 182087 | |
| 1 | 17913 | 9.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 182087 | |
| 1 | 17913 | 9.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 182087 | |
| 1 | 17913 | 9.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 182087 | |
| 1 | 17913 | 9.0% |
wealth
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 1 | 9369 |
| 2 | 9306 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 2 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 181325 | |
| 1 | 9369 | 4.7% |
| 2 | 9306 | 4.7% |
| Value | Count | Frequency (%) |
| 0 | 181325 | |
| 1 | 9369 | 4.7% |
| 2 | 9306 | 4.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 181325 | |
| 1 | 9369 | 4.7% |
| 2 | 9306 | 4.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 181325 | |
| 1 | 9369 | 4.7% |
| 2 | 9306 | 4.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 181325 | |
| 1 | 9369 | 4.7% |
| 2 | 9306 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 181325 | |
| 1 | 9369 | 4.7% |
| 2 | 9306 | 4.7% |
land
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 2 | 6061 |
| 1 | 1429 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 192510 | |
| 2 | 6061 | 3.0% |
| 1 | 1429 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 192510 | |
| 2 | 6061 | 3.0% |
| 1 | 1429 | 0.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 192510 | |
| 2 | 6061 | 3.0% |
| 1 | 1429 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 192510 | |
| 2 | 6061 | 3.0% |
| 1 | 1429 | 0.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 192510 | |
| 2 | 6061 | 3.0% |
| 1 | 1429 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 192510 | |
| 2 | 6061 | 3.0% |
| 1 | 1429 | 0.7% |
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.29921 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 174881 |
| Zeros (%) | 87.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.9406422274 |
|---|---|
| Coefficient of variation (CV) | 3.14375264 |
| Kurtosis | 13.73052065 |
| Mean | 0.29921 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.662991601 |
| Sum | 59842 |
| Variance | 0.8848077999 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 174881 | |
| 1 | 9596 | 4.8% |
| 3 | 5433 | 2.7% |
| 2 | 5300 | 2.6% |
| 5 | 3534 | 1.8% |
| 4 | 957 | 0.5% |
| 6 | 260 | 0.1% |
| 7 | 23 | < 0.1% |
| 8 | 16 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 174881 | |
| 1 | 9596 | 4.8% |
| 2 | 5300 | 2.6% |
| 3 | 5433 | 2.7% |
| 4 | 957 | 0.5% |
| Value | Count | Frequency (%) |
| 8 | 16 | < 0.1% |
| 7 | 23 | < 0.1% |
| 6 | 260 | 0.1% |
| 5 | 3534 | |
| 4 | 957 | 0.5% |
birth_date
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 6 | |
| 8 | 1837 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 175690 | |
| 6 | 22473 | 11.2% |
| 8 | 1837 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 175690 | |
| 6 | 22473 | 11.2% |
| 8 | 1837 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 175690 | |
| 6 | 22473 | 11.2% |
| 8 | 1837 | 0.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 175690 | |
| 6 | 22473 | 11.2% |
| 8 | 1837 | 0.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 175690 | |
| 6 | 22473 | 11.2% |
| 8 | 1837 | 0.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 175690 | |
| 6 | 22473 | 11.2% |
| 8 | 1837 | 0.9% |
mirror
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 2 | 11256 |
| 3 | 1371 |
| 4 | 93 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 187280 | |
| 2 | 11256 | 5.6% |
| 3 | 1371 | 0.7% |
| 4 | 93 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 187280 | |
| 2 | 11256 | 5.6% |
| 3 | 1371 | 0.7% |
| 4 | 93 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 187280 | |
| 2 | 11256 | 5.6% |
| 3 | 1371 | 0.7% |
| 4 | 93 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 187280 | |
| 2 | 11256 | 5.6% |
| 3 | 1371 | 0.7% |
| 4 | 93 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 187280 | |
| 2 | 11256 | 5.6% |
| 3 | 1371 | 0.7% |
| 4 | 93 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 187280 | |
| 2 | 11256 | 5.6% |
| 3 | 1371 | 0.7% |
| 4 | 93 | < 0.1% |
legacy
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 170587 | |
| 1 | 29413 | 14.7% |
| Value | Count | Frequency (%) |
| 0 | 170587 | |
| 1 | 29413 | 14.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 170587 | |
| 1 | 29413 | 14.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 170587 | |
| 1 | 29413 | 14.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 170587 | |
| 1 | 29413 | 14.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 170587 | |
| 1 | 29413 | 14.7% |
memorable
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 1 | 10045 |
| 2 | 7210 |
| 3 | 3868 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 178877 | |
| 1 | 10045 | 5.0% |
| 2 | 7210 | 3.6% |
| 3 | 3868 | 1.9% |
| Value | Count | Frequency (%) |
| 0 | 178877 | |
| 1 | 10045 | 5.0% |
| 2 | 7210 | 3.6% |
| 3 | 3868 | 1.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 178877 | |
| 1 | 10045 | 5.0% |
| 2 | 7210 | 3.6% |
| 3 | 3868 | 1.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 178877 | |
| 1 | 10045 | 5.0% |
| 2 | 7210 | 3.6% |
| 3 | 3868 | 1.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 178877 | |
| 1 | 10045 | 5.0% |
| 2 | 7210 | 3.6% |
| 3 | 3868 | 1.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 178877 | |
| 1 | 10045 | 5.0% |
| 2 | 7210 | 3.6% |
| 3 | 3868 | 1.9% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.032885 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 197897 |
| Zeros (%) | 98.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.3227446322 |
|---|---|
| Coefficient of variation (CV) | 9.814341863 |
| Kurtosis | 107.1418722 |
| Mean | 0.032885 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 10.10661549 |
| Sum | 6577 |
| Variance | 0.1041640976 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 197897 | |
| 3 | 1911 | 1.0% |
| 4 | 151 | 0.1% |
| 6 | 17 | < 0.1% |
| 5 | 16 | < 0.1% |
| 7 | 6 | < 0.1% |
| 8 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 197897 | |
| 3 | 1911 | 1.0% |
| 4 | 151 | 0.1% |
| 5 | 16 | < 0.1% |
| 6 | 17 | < 0.1% |
| Value | Count | Frequency (%) |
| 8 | 2 | < 0.1% |
| 7 | 6 | < 0.1% |
| 6 | 17 | < 0.1% |
| 5 | 16 | < 0.1% |
| 4 | 151 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.12792 |
| Minimum | 0 |
|---|---|
| Maximum | 7 |
| Zeros | 191713 |
| Zeros (%) | 95.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.6195469254 |
|---|---|
| Coefficient of variation (CV) | 4.843237378 |
| Kurtosis | 21.39643248 |
| Mean | 0.12792 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.745225571 |
| Sum | 25584 |
| Variance | 0.3838383928 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 191713 | |
| 3 | 7715 | 3.9% |
| 4 | 463 | 0.2% |
| 5 | 72 | < 0.1% |
| 6 | 32 | < 0.1% |
| 7 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 191713 | |
| 3 | 7715 | 3.9% |
| 4 | 463 | 0.2% |
| 5 | 72 | < 0.1% |
| 6 | 32 | < 0.1% |
| Value | Count | Frequency (%) |
| 7 | 5 | < 0.1% |
| 6 | 32 | < 0.1% |
| 5 | 72 | < 0.1% |
| 4 | 463 | 0.2% |
| 3 | 7715 |
repeat
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 MiB |
| 0 | |
|---|---|
| 1 | 6242 |
| 2 | 2510 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 200000 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 191248 | |
| 1 | 6242 | 3.1% |
| 2 | 2510 | 1.3% |
| Value | Count | Frequency (%) |
| 0 | 191248 | |
| 1 | 6242 | 3.1% |
| 2 | 2510 | 1.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 191248 | |
| 1 | 6242 | 3.1% |
| 2 | 2510 | 1.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 191248 | |
| 1 | 6242 | 3.1% |
| 2 | 2510 | 1.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 200000 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 191248 | |
| 1 | 6242 | 3.1% |
| 2 | 2510 | 1.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 200000 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 191248 | |
| 1 | 6242 | 3.1% |
| 2 | 2510 | 1.3% |
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.72672 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 11894 |
| Zeros (%) | 5.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3 |
| median | 6 |
| Q3 | 8 |
| 95-th percentile | 9 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 2.88038878 |
|---|---|
| Coefficient of variation (CV) | 0.502973566 |
| Kurtosis | -0.9523011845 |
| Mean | 5.72672 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.5458427771 |
| Sum | 1145344 |
| Variance | 8.296639525 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 42880 | |
| 8 | 34739 | |
| 6 | 28169 | |
| 7 | 14951 | 7.5% |
| 5 | 14515 | 7.3% |
| 4 | 14139 | 7.1% |
| 3 | 13661 | 6.8% |
| 2 | 12675 | 6.3% |
| 1 | 12377 | 6.2% |
| 0 | 11894 | 5.9% |
| Value | Count | Frequency (%) |
| 0 | 11894 | |
| 1 | 12377 | |
| 2 | 12675 | |
| 3 | 13661 | |
| 4 | 14139 |
| Value | Count | Frequency (%) |
| 9 | 42880 | |
| 8 | 34739 | |
| 7 | 14951 | 7.5% |
| 6 | 28169 | |
| 5 | 14515 | 7.3% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| price_vnd | sim_number | provider | same_of_a_kind | straight | same_of_a_kind_middle | straight_middle | fortune | wealth | land | taxi | birth_date | mirror | legacy | memorable | reserve | reserve_middle | repeat | last_number | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 450000 | 343189288 | viettel | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 |
| 1 | 3000000 | 888899580 | vinaphone | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 500000 | 928960006 | vietnamobile | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |
| 3 | 5000000 | 902438679 | mobifone | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 9 |
| 4 | 450000 | 334307889 | viettel | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 |
| 5 | 450000 | 328190680 | viettel | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 12000000 | 926052005 | vietnamobile | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 5 |
| 7 | 3000000 | 921220924 | vietnamobile | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 |
| 8 | 500000 | 877001866 | itelecom | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |
| 9 | 199000000 | 769889999 | mobifone | 4 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 9 |
Last rows
| price_vnd | sim_number | provider | same_of_a_kind | straight | same_of_a_kind_middle | straight_middle | fortune | wealth | land | taxi | birth_date | mirror | legacy | memorable | reserve | reserve_middle | repeat | last_number | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 199990 | 450000 | 336083085 | viettel | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 5 |
| 199991 | 10300000 | 888883235 | vinaphone | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 |
| 199992 | 500000 | 926290568 | vietnamobile | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 |
| 199993 | 1000000 | 945510776 | vinaphone | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |
| 199994 | 3000000 | 977622004 | viettel | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 1 | 0 | 0 | 0 | 0 | 4 |
| 199995 | 450000 | 866161769 | viettel | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 |
| 199996 | 1000000 | 708124126 | mobifone | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 6 |
| 199997 | 1000000 | 904755200 | mobifone | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 199998 | 450000 | 329220204 | viettel | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 4 |
| 199999 | 1000000 | 904760321 | mobifone | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 3 | 0 | 0 | 1 |